Draft

The information presented here is placed in the public domain, and was written by Doug Burke. The notebook used to create this page is available (empty or filled), and questions can be asked using the GitHub issues page or via Twitter: https://twitter.com/doug_burke.

It will probably make a lot-more sense if you have already read the previous entries in this "series".

Playing with Frames

In the following notebook I finally get around to trying out the Frames library, which is an attempt to provide data frame-like capabilities (from R and pandas/Python) in Haskell. This library is quite new, and at the time of writing, is not available on Hackage; a direct installation via GitHub is required:

% git clone https://github.com/acowley/Frames.git
% cabal sandbox add-source Frames
% cabal install Frames foldl lens-family --dry-run
% cabal install Frames foldl lens-family

Last time the notebook was run



In [1]:

    
import Data.Time
getCurrentTime









    





2015-04-23 19:46:24.050029 UTC

What data shall I use?

The following is strongly based on the Frames tutorial, but using the ARF from the previous notebook as the data source. As I haven't quite got around to writing a full FITS parser for Haskell just yet, I am going to "cheat" and convert the ARF to a CSV file using tools from CIAO:

copy the FITS file (src.arf) to an ASCII "almost-CSV" file (arf.csv)

% dmcopy src.arf "arf.csv[opt kernel=text,sep=',']"
% head -5 arf.csv 
#TEXT/SIMPLE
#,ENERG_LO,ENERG_HI,SPECRESP
0.3000000, 0.3100000, 4.874343
0.3100000, 0.3200000, 14.82926
0.3200000, 0.3300000, 21.30229

manually edit the file to remove the first line and fix up the column names

% head -5 arf.csv 
ENERG_LO,ENERG_HI,SPECRESP
0.3000000, 0.3100000, 4.874343
0.3100000, 0.3200000, 14.82926
0.3200000, 0.3300000, 21.30229
0.3300000, 0.3400000, 28.51495

For the purposes of this notebook the file is accessed as ../data/arf.csv.

Using the Frames library requires a selection of GHC extensions. When used in source code the form

{-# LANGUAGE extension1, ..., extensionN #-}

is used, but for the IHaskell notebook it's a bunch of :set -Xextension1 calls. It's not obvious if I need all these in this notebook.



In [1]:

    
-- taken from the tutorial: http://acowley.github.io/Frames/

{-# LANGUAGE ConstraintKinds, DataKinds, FlexibleContexts, GADTs,
             OverloadedStrings, PatternSynonyms, QuasiQuotes,
             ScopedTypeVariables, TemplateHaskell, TypeOperators,
             ViewPatterns #-}

:set -XConstraintKinds
:set -XDataKinds
-- :set -XFlexibleContexts
:set -XGADTs
-- :set -XOverloadedStrings
-- :set -XPatternSynonyms
-- :set -XQuasiQuotes
-- :set -XScopedTypeVariables
-- :set -XTemplateHaskell
-- :set -XTypeOperators
-- :set -XViewPatterns

Here are some of the modules that I will use later on:



In [2]:

    
import qualified Control.Foldl as L
import qualified Data.Foldable as F

import Control.Applicative ((<$>), (<*>), pure)

import Data.Proxy (Proxy(..))

import Lens.Family (view)

import Frames
import Frames.CSV (readTableOpt)

The "trick" of the Frames package - i.e. the way that it provides a typed data frame - is that data loading is separated into two parts:

determination of the column names and column types (e.g. the "schema" for the data)
reading in the data using this "schema" (this can be done in a "streaming" manner, i.e. chunks of rows at a time to support handling very-large files, or all the data can be read in one go)

So, the "evaluation" of the CSV file is handled by the tableTypes function, which is given the name of the type which will be used to represent a row, and the path to the CSV file (there's a variant where more control is provided to the user, but I'm using the "easy" version here).



In [3]:

    
tableTypes "ARF" "../data/arf.csv"

Once this has succeeded, the compiler will create an ARF type that represents the data in the file. This is handled by Template Haskell, which is a GHC extension that provides compile-time metaprogramming to Haskell. Since I am using IHaskell, the fact that this is done compile-time versus evaluation-time is somewhat hidden from me (until, if you are anything like me, you accidentally give tableTypes an invalid file path and then wonder why the Haskell kernel dies on you!).

The first argument has to be capitalized, since it represents the name of a Haskell type (which are syntactically restricted to starting with a capital letter). Let's look at what the compiler has worked out for the ARF type:



In [4]:

    
:opt no-pager



In [5]:

    
:info ARF









    









    




type ARF = Record '["ENERG_LO" :-> Double, "ENERG_HI" :-> Double, "SPECRESP" :-> Double] 	-- Defined at <interactive>:1:1</interactive>

So, a row is being modelled by a Record with what appears to be a list$^\dagger$ of three name :-> type values, which is how "named" fields are represented in Frames. This is a different approach to the standard Haskell record type which I used in the previous notebook to represent XXX-sometihng-or-other-very-interesting-XXX.

Some form of Heterogenous lists

XXX

$^\dagger$ This is not a "normal" list since it starts with "'[" rather than just "[". The single quote character makes all the difference, and indicates that this is a type-level list. If you look closely, you may just spot a few more single quote characters popping up in this notebook.

The names are the column names, and their types are all Double, which suggests that it's recognized that eash row has three elements: ENERG_LO, ENERG_HI, and SPECRESP, each of type Double.

XXX comment on "list-like" and Record

XXX got to here

The ARF type is not the only thing that the call to tableTypes has created. One of the other items is the set of options needed to parse the CSV file. In this case, the generated name is aRFParser (since it's a function or value it has to start with a lower-case character; in this particular case the resulting name is perhaps not-that elegant).



In [6]:

    
:t aRFParser









    




aRFParser :: ParserOptions

This parser is used with readTableOpt to read in the data. As the variable name (arfStream) suggests, this supports a streaming approach (i.e. is useful for handling large data sets, that you may not want to read in all in one go), built on the Pipes library.

XXX



In [7]:

    
arfStream = readTableOpt aRFParser "../data/arf.csv"

The type is intimidating...



In [8]:

    
:type arfStream









    




arfStream :: forall (m :: * -> *) (rs :: [*]). (ReadRec rs, MonadIO m) => Producer (Record rs) m ()

but fortunately we can ignore this and just convert it into an "in-memory" representation (in this case, an "Array of Structures") using the inCoreAoS function.



In [9]:

    
-- The signature is needed for the type inference to work
-- (i.e. the ":: IO (Frame ARF)" is needed).
--
arf <- inCoreAoS arfStream :: IO (Frame ARF)



In [10]:

    
:type arf









    




arf :: Frame ARF

So, what can I do with this?

Well, I can get back the column headers:



In [11]:

    
columnHeaders arf









    





["ENERG_LO","ENERG_HI","SPECRESP"]

Although not obvious from the above, this gets the column names from the type (ARF) and not the value (arf). One way of showing this is by passing in the type (or, at least, a representation of the type using the Proxy type), which returns the same as the previous command, but without needing the "value" (i.e. the actual data). This is different from a language like Python, where you'd get the headers from the object (i.e. the value). Using an interactive environment like this IHaskell notebook can blur the distinction - e.g. a compile-time error (as shown below) can look like the run-time error you'd get from Python - so it's good to remember that there's a difference (although, to be honest, a lot of code in this particular notebook is going to be rather different to what you'd seen in an equivalent Python session ;-).



In [12]:

    
columnHeaders (Proxy :: Proxy ARF)









    





["ENERG_LO","ENERG_HI","SPECRESP"]

I'm really impressed with this, but it's what's important is how easy it is to access and use all this information.

Let's start with the number of rows in the table:



In [13]:

    
frameLength arf

XXX TODO: later on will probably want to mention the Fold.length option, but only after talking about F.toList

XXX TODO: rework; should just display the first row, and mention what's going on a bit, and then show multiple rows. This requires a lot of reorganization.

and how about the first row (counting from 0):



In [104]:

    
frameRow arf 0









    





{ENERG_LO :-> 0.3, ENERG_HI :-> 0.31, SPECRESP :-> 4.874343}

The frameRow function extracts a row from a Frame, so when given a Frame a it will return a value of type a:



In [15]:

    
:type frameRow









    




frameRow :: forall r. Frame r -> Int -> r

which, when given arf, means that the value returned is an ARF:



In [16]:

    
:type frameRow arf 0









    




frameRow arf 0 :: ARF

The Show instance displays both the column (or field) names as well as the values. Can we select a column? Yes, but it's complicated. First I save the first row just to simplify later code:



In [17]:

    
row0 = frameRow arf 0

There's a bit of a cottage industry that's sprung up to explore improving (or replacing) Haskell's named records. The lens approach$^\dagger$ - named since it's all about "focusing in" on a field - is used here to extract a single element of the row. The view function is the "lens" function, and it' given a way of identifying the record we are interested in (here eNERGLO$^\ddagger$) along with the data type to be "inspected" (in this case the first row of the table).

$^\dagger$ There's several modules which include lens in their name, which can make navigating things a bit confusing.

$^\ddagger$ The tableTypes routine created a number of symbols related to the columns. In this example the choices it has made lead to visually-unsatisfying symbols such as eNERGLO here. This is down to the fact that the column names are all upper case, include the underscore character, and that Haskell symbols must start with a lower-case character. This should not be taken as a complaint of Frames, as there is always going to be a corner case when converting between labelling schemes, and I don't have a suggestion for how to improve on the current behavior!



In [18]:

    
view eNERGLO row0

0.3

The other columns can also be accessed:



In [20]:

    
view eNERGHI row0
view sPECRESP row0

Using view we can therefore access the row contents, but it's a bit awkward to quickly look at a group of rows. For that, I go back to frameRow:



In [23]:

    
printRow :: Int -> IO ()
printRow = print . frameRow arf

Anyone who read my last notebook will have noticed my tendency to write pipeline-style routines that take advantage of function composition (the . operator) and point-free syntax. I included the type signature for printRow but the compiler could have inferred it; it says that the routine accepts an integer argument (which is going to be passed to frameRow) to create a row (so, in this case a value of type ARF), and then it is passed to print, which converts the row to a string and prints it to the screen$^\dagger$.

$^\dagger$ Remember that, when using function composition like this, you read from right to left, since f . g . h is the same as f(g(h(x)), and that the argument, x, is implicit.

For example:



In [25]:

    
printRow 1









    





{ENERG_LO :-> 0.31, ENERG_HI :-> 0.32, SPECRESP :-> 14.82926}

In the notebook this doesn't look any different to calling frameRow arf 1, but that's because the notebook automatically displays any return value (if its type is an instance of the Show typeclass). If I were to "capture" the return value then this would not happen as in the following:



In [26]:

    
row1 = frameRow arf 1

Now, print has an interesting type:



In [27]:

    
:type print









    




print :: forall a. Show a => a -> IO ()

Firstly, the Show constraint - needed to be able to convert the value to display into a string - and secondly, the return value, which is IO (). The value arf has a type Frame ARF, which can be thought of as saying it has something to do with the ARF type but also provides some "context" (in this case the fact that there's more than one). For IO (), the "value type" is () and the "context" is IO. Note that "context" here really means the semantics provided by the typeclasses that the type is an instance of (e.g. if it is an instance of Show then it can be converted to a string, if a Functor then it is like a single-element container), except for a small wrinkle, in that the IO type is how Haskell provides input and output (hence the name ;-). The () type is known as unit and it has only one value, which is (), that is



In [28]:

    
:type ()









    




() :: ()

It's a way of representing "nothing", and I like to think of it as a 0-dimensional item, reminiscent of the God of Pointland.

Putting all of this together explains the type of printRow, namely Int -> IO (). If I wanted to apply this to several rows - say, 9, 12, and 25, I might be tempted to use map, but it wouldn't cause the rows to be printed. Instead, in the notebook, you get a complaint about IO () - which is the return value of printRow - being unshowable.



In [31]:

    
map printRow [9, 12, 25]









    




Unshowable:IO ()No instance for (Show (IO ())) arising from a use of ‘print’
In a stmt of an interactive GHCi command: print it

This is because the I/O provided by the IO type - in this case, the display of the contents to the screen - is actually a "side effect" of the function, and only happens when the value is evaluated. The map function creates a list of "actions", one for each row, as shown by the type being [IO ()], but does not actually "evaluate" (or run) them. This is really frustrating and surprising and ... when coming from imperative-style languages like Python, C, or Java, but is an important part of how Haskell allows you to reason about your code$^\dagger$.

$^\dagger$ I'm not actually going to back up this statement in this notebook (there are other resources, such as the Input and Output section of the Learn You A Haskell book, or the Tackling the Awkward Squad tutorial by Simon Peyton Jones)



In [30]:

    
:type map printRow [9, 12, 25]









    




map printRow [0, 1, 2] :: [IO ()]

There are several ways to "run" these actions, but the simplest way here is to replace map by mapM_:



In [32]:

    
mapM_ printRow [9, 12, 25]









    





{ENERG_LO :-> 0.39, ENERG_HI :-> 0.4, SPECRESP :-> 69.30855}
{ENERG_LO :-> 0.42, ENERG_HI :-> 0.43, SPECRESP :-> 94.29903}
{ENERG_LO :-> 0.55, ENERG_HI :-> 0.56, SPECRESP :-> 161.4068}

It can actually be really useful to pass around "I/O" actions - that is, create them in one place and send them to a different part of the program to be executed later - but that's not for today's notebook!

XXX It would be nice to be able to convert to a list, so that Haskell list routines - e.g. mapM_ - could be used. This can be used to introduce F.toList.

With the above, I can display the first 3 rows of arf by saying

printN 3 arf

The routine works by converting the "Frame" arf into a list of records via the toList routine from Data.Foldable (which can be thought of as providing parts of a container-like abstraction in Haskell, and is often used with the Data.Traversable module, which I'll be talking about below) ). This list is then subsetted using take, which just grabs the first n items of a list (or the whole list if n is larger than the length), and then passed to the mapM_ print routine. This acts like the standard map routine, in that it applies print to each value, but we need to use mapM_ here because print has to be run in a "special" way (that is, it needs some context, such as the output handle to use) and produces a side effect (namely, displaying the value, after converting it to a string). Since we are actually only interested in this side effect, we end up throwing away the value returned by print (which is why I used the mapM_ variant, rather than the version without the trailing underscore - namely mapM - which would have returned something).

Note that printN does not know it is being sent a Frame since it's signature is very generic, namely:

XXX OLD So, let's try to look at the column data. I start with the first 5 rows:



In [88]:

    
mapM_ print (take 5 (F.toList arf))









    





{ENERG_LO :-> 0.3, ENERG_HI :-> 0.31, SPECRESP :-> 4.874343}
{ENERG_LO :-> 0.31, ENERG_HI :-> 0.32, SPECRESP :-> 14.82926}
{ENERG_LO :-> 0.32, ENERG_HI :-> 0.33, SPECRESP :-> 21.30229}
{ENERG_LO :-> 0.33, ENERG_HI :-> 0.34, SPECRESP :-> 28.51495}
{ENERG_LO :-> 0.34, ENERG_HI :-> 0.35, SPECRESP :-> 35.39883}

Before I explain this, I am going to write a small function to make this easier to use, as I'll be using it again soon:



In [89]:

    
printN n = mapM_ print . take n . F.toList

printN 3 arf

Remember that, when using function composition like this, you read from right to left, since f . g . h is the same as f(g(h(x)) (and that the argument, x, is implicit). The routine works by converting the "Frame" arf into a list of records via the toList routine from Data.Foldable (which can be thought of as providing parts of a container-like abstraction in Haskell, and is often used with the Data.Traversable module, which I'll be talking about below) ). This list is then subsetted using take, which just grabs the first n items of a list (or the whole list if n is larger than the length), and then passed to the mapM_ print routine. This acts like the standard map routine, in that it applies print to each value, but we need to use mapM_ here because print has to be run in a "special" way (that is, it needs some context, such as the output handle to use) and produces a side effect (namely, displaying the value, after converting it to a string). Since we are actually only interested in this side effect, we end up throwing away the value returned by print (which is why I used the mapM_ variant, rather than the version without the trailing underscore - namely mapM - which would have returned something).

Note that printN does not know it is being sent a Frame since it's signature is very generic, namely:



In [90]:

    
:type printN









    




printN :: forall a (t :: * -> *). (Foldable t, Show a) => Int -> t a -> IO ()

which just says that the second argument must be something that is an instance of the Foldable typeclass (this is so that toList is defined), and that it contains a type that is Showable (so that print) can work. The result is IO (), which is something in the IO context (in other words, it can do any sort of input or output, such as print something to the screen or delete your home directory), and it has the value of (), which is known in Haskell as unit. It's a way of representing "nothing", since it's type - which is also called ()- has only one value, namely (). Another way to thnk about it is that it is the empty tuple, so it's like a 0-dimensional thing (and so is reminiscent of the God of Pointland.

Anyway, back to looking at the data: we can also look at the last 5 elements, which requires knowing the number of rows in the frame. Fortunately, this is provided by frameLength:



In [92]:

    
frameLength arf

If I were using the most-recnt version of ghc - namely 7.10.0 - then the Foldable class contains a length routine, so I should also be able to say F.length arf, but I am using ghc 7.8.4 (it's also not guaranteed that the the Foldable version would be as efficient as frameLength).

With this, I can then get the last 5 rows by using array indexing and the frameRow function:



In [93]:

    
:type frameRow









    




frameRow :: forall r. Frame r -> Int -> r

which provides row-level access. So the first row can be accessed via:



In [94]:

    
frameRow arf 0









    





{ENERG_LO :-> 0.3, ENERG_HI :-> 0.31, SPECRESP :-> 4.874343}



In [96]:

    
nrows = frameLength arf

With this I can create a list of the last 5 indices using the syntax [a .. b] which creates values a, a+1, a+2, ... up to, and including, b. These can then be applied to frameRow, one at a time, and the result passed though to print, using mapM_:



In [97]:

    
mapM_ (print . frameRow arf) [nrows - 5 .. nrows - 1]









    





{ENERG_LO :-> 10.95, ENERG_HI :-> 10.96, SPECRESP :-> 0.6451889}
{ENERG_LO :-> 10.96, ENERG_HI :-> 10.97, SPECRESP :-> 0.6250272}
{ENERG_LO :-> 10.97, ENERG_HI :-> 10.98, SPECRESP :-> 0.6049552}
{ENERG_LO :-> 10.98, ENERG_HI :-> 10.99, SPECRESP :-> 0.5852773}
{ENERG_LO :-> 10.99, ENERG_HI :-> 11.0, SPECRESP :-> 0.5657126}

So, using this, I could create a function which takes a list of indexes and displays their values:



In [98]:

    
printIds = mapM_ (print . frameRow arf)

Note that the inferred type is much less general than printN; all we know by looking at it is that some sort of IO could happen when given a list of integers.



In [99]:

    
:type printIds









    




printIds :: [Int] -> IO ()

This can be used to print out a subset of data, for instance every 100 rows (it always takes me time to remember that a,b..c is a, b, a+2*(b-a), ..c, so is similar to, but not-quite the same as, NumPy's arange(a,c,b-a)):



In [100]:

    
printIds [0, 100..nrows]









    





{ENERG_LO :-> 0.3, ENERG_HI :-> 0.31, SPECRESP :-> 4.874343}
{ENERG_LO :-> 1.3, ENERG_HI :-> 1.31, SPECRESP :-> 632.5857}
{ENERG_LO :-> 2.3, ENERG_HI :-> 2.31, SPECRESP :-> 371.9744}
{ENERG_LO :-> 3.3, ENERG_HI :-> 3.31, SPECRESP :-> 415.4583}
{ENERG_LO :-> 4.3, ENERG_HI :-> 4.31, SPECRESP :-> 408.3406}
{ENERG_LO :-> 5.3, ENERG_HI :-> 5.31, SPECRESP :-> 302.1592}
{ENERG_LO :-> 6.3, ENERG_HI :-> 6.31, SPECRESP :-> 178.7527}
{ENERG_LO :-> 7.3, ENERG_HI :-> 7.31, SPECRESP :-> 80.63524}
{ENERG_LO :-> 8.3, ENERG_HI :-> 8.31, SPECRESP :-> 32.38216}
{ENERG_LO :-> 9.3, ENERG_HI :-> 9.31, SPECRESP :-> 18.44848}
{ENERG_LO :-> 10.3, ENERG_HI :-> 10.31, SPECRESP :-> 4.737}

What happens with an invalid index?



In [101]:

    
printIds [nrows]









    




./Data/Vector/Generic.hs:249 ((!)): index out of bounds (1070,1070)

This is a runtime error, since the type system isn't able to figure out that the array index here is invalid. There are several ways to "improve" on this, such as having an array access routine that returns Maybe a rather than a, to use some form of dependent-type based approach to tracking the constraint, or the (less confusing to me) refinement types approach, but for now we'll have to remember that just because you have a really-good type system, you can still make mistakes.

XXXX got to here XXXX

XXX old ish

How about selecting some of the data?



In [14]:

    
:type view









    




view :: forall b a a' b'. FoldLike b a a' b b' -> a -> b

Argh: my eyes - sPECRESP is not a visually-engaging symbol...



In [15]:

    
:type view sPECRESP









    




view sPECRESP :: forall (rs0 :: [*]). RElem SPECRESP rs0 (RIndex SPECRESP rs0) => Record rs0 -> Double



In [16]:

    
:type view sPECRESP <$> arf









    




view sPECRESP <$> arf :: Frame Double



In [17]:

    
view sPECRESP <$> arf









    




Unshowable:Frame DoubleNo instance for (Show (Frame Double)) arising from a use of ‘print’
In a stmt of an interactive GHCi command: print it

So, this is "projecting" out the SPECRESP field from the row, creating a frame with only this content:



In [18]:

    
mapM_ print (take 3 (F.toList (view sPECRESP <$> arf)))

How about summary statistics? This can often achieved with a fold over the structure:



In [20]:

    
:type L.fold
:type L.fold L.minimum
:type L.fold L.minimum (view sPECRESP <$> arf)









    




L.fold :: forall a b (f :: * -> *). Foldable f => Fold a b -> f a -> b






    




L.fold L.minimum :: forall a (f :: * -> *). (Foldable f, Ord a) => f a -> Maybe a






    




L.fold L.minimum (view sPECRESP <$> arf) :: Maybe Double



In [21]:

    
L.fold L.minimum (view sPECRESP <$> arf)
L.fold L.maximum (view sPECRESP <$> arf)









    





Just 0.5657126






    





Just 672.1996

Note that the following does not evaluate the structure twice (once for minimum and once for maximum):



In [22]:

    
-- from the tutorial
minMax :: Ord a => L.Fold a (Maybe a, Maybe a)
minMax = (,) <$> L.minimum <*> L.maximum



In [23]:

    
L.fold minMax (view sPECRESP <$> arf)









    





(Just 0.5657126,Just 672.1996)



In [24]:

    
:type L.pretraverse sPECRESP









    




L.pretraverse sPECRESP :: forall r (rs0 :: [*]). RElem SPECRESP rs0 (RIndex SPECRESP rs0) => Fold Double r -> Fold (Record rs0) r



In [25]:

    
:type L.pretraverse sPECRESP minMax









    




L.pretraverse sPECRESP minMax :: forall (rs0 :: [*]). RElem SPECRESP rs0 (RIndex SPECRESP rs0) => Fold (Record rs0) (Maybe Double, Maybe Double)



In [26]:

    
L.fold (L.pretraverse sPECRESP minMax) arf









    





(Just 0.5657126,Just 672.1996)

Note that in the following the columns include their names (or, rather, the fields in each record). This can be compared to earlier when view sPECRESP was used to "extract" out the contents of the SPECRESP field, but losing the name):



In [27]:

    
printN 3 arf









    





{ENERG_LO :-> 0.3, ENERG_HI :-> 0.31, SPECRESP :-> 4.874343}
{ENERG_LO :-> 0.31, ENERG_HI :-> 0.32, SPECRESP :-> 14.82926}
{ENERG_LO :-> 0.32, ENERG_HI :-> 0.33, SPECRESP :-> 21.30229}



In [28]:

    
printN 3 (view sPECRESP <$> arf)

How about just selecting a single column but leaving the "named field" part:



In [29]:

    
:info SPECRESP









    









    




type SPECRESP = "SPECRESP" :-> Double 	-- Defined at <interactive>:1:1</interactive>



In [30]:

    
select (Proxy::Proxy '[SPECRESP]) $ frameRow arf 0









    





{SPECRESP :-> 4.874343}



In [31]:

    
sprespOnly :: ARF -> Record '[SPECRESP]
sprespOnly = rcast



In [32]:

    
printN 3 (fmap sprespOnly arf)









    





{SPECRESP :-> 4.874343}
{SPECRESP :-> 14.82926}
{SPECRESP :-> 21.30229}



In [33]:

    
nrows = frameLength arf
mapM_ (print . frameRow arf) [nrows - 3 .. nrows - 1]









    





{ENERG_LO :-> 10.97, ENERG_HI :-> 10.98, SPECRESP :-> 0.6049552}
{ENERG_LO :-> 10.98, ENERG_HI :-> 10.99, SPECRESP :-> 0.5852773}
{ENERG_LO :-> 10.99, ENERG_HI :-> 11.0, SPECRESP :-> 0.5657126}



In [34]:

    
-- subsets
mapM_ print (take 5 (filter ((>= 600) . view sPECRESP) (F.toList arf)))









    





{ENERG_LO :-> 1.18, ENERG_HI :-> 1.19, SPECRESP :-> 601.2171}
{ENERG_LO :-> 1.19, ENERG_HI :-> 1.2, SPECRESP :-> 604.2618}
{ENERG_LO :-> 1.2, ENERG_HI :-> 1.21, SPECRESP :-> 607.2027}
{ENERG_LO :-> 1.21, ENERG_HI :-> 1.22, SPECRESP :-> 610.0424}
{ENERG_LO :-> 1.22, ENERG_HI :-> 1.23, SPECRESP :-> 612.9407}



In [35]:

    
-- would be nice to have a HTML representation of the table;
-- how to get column names (see above)?
arf









    




Unshowable:Frame ARFNo instance for (Show (Frame ARF)) arising from a use of ‘print’
In a stmt of an interactive GHCi command: print it



In [36]:

    
-- and how to extract the data to plot it? Frames has plot examples



In [37]:

    
:t recUncons









    




recUncons :: forall (s :: Symbol) a (rs :: [*]). Record ((s :-> a) : rs) -> (a, Record rs)



In [38]:

    
-- this gives the value, but not the name
(a1,b1) = recUncons (frameRow arf 0)
(a2,b2) = recUncons b1
(a3,b3) = recUncons b2

(a1,a2,a3)









    





(0.3,0.31,4.874343)



In [39]:

    
b1
b2
b3









    





{ENERG_HI :-> 0.31, SPECRESP :-> 4.874343}






    





{SPECRESP :-> 4.874343}






    





{}



In [40]:

    
-- as expected, this falls over
recUncons b3









    




Couldn't match type ‘'[]’ with ‘(s0 :-> a) : rs’
Expected type: Record ((s0 :-> a) : rs)
  Actual type: Record '[]
Relevant bindings include it :: (a, Record rs) (bound at :1:1)
In the first argument of ‘recUncons’, namely ‘b3’
In the expression: recUncons b3



In [41]:

    
showFields (frameRow arf 0)









    





["0.3","0.31","4.874343"]



In [42]:

    
toVinyl (frameRow arf 0)









    





{0.3, 0.31, 4.874343}



In [43]:

    
:type toVinyl (frameRow arf 0)









    




toVinyl (frameRow arf 0) :: Rec Identity (UnColumn '["ENERG_LO" :-> Double, "ENERG_HI" :-> Double, "SPECRESP" :-> Double])



In [44]:

    
:type frameRow









    




frameRow :: forall r. Frame r -> Int -> r



In [45]:

    
:info ARF









    









    




type ARF = Record '["ENERG_LO" :-> Double, "ENERG_HI" :-> Double, "SPECRESP" :-> Double] 	-- Defined at <interactive>:1:1</interactive>

Let's leave making a nice HTML table for now.

From Frames.Col:

instance forall s a. (KnownSymbol s, Show a) => Show (s :-> a) where
  show (Col x) = symbolVal (Proxy::Proxy s)++" :-> "++show x



In [46]:

    
:info ENERGLO









    









    




type ENERGLO = "ENERG_LO" :-> Double 	-- Defined at <interactive>:1:1</interactive>



In [47]:

    
type ENERGY = "ENERGY" :-> Double



In [48]:

    
rnew :: Record '[ENERGY, ENERGLO, ENERGHI, SPECRESP]
rnew = frameCons (pure 0.305) (frameRow arf 0)



In [49]:

    
rnew









    





{ENERGY :-> 0.305, ENERG_LO :-> 0.3, ENERG_HI :-> 0.31, SPECRESP :-> 4.874343}



In [50]:

    
:info ARF









    









    




type ARF = Record '["ENERG_LO" :-> Double, "ENERG_HI" :-> Double, "SPECRESP" :-> Double] 	-- Defined at <interactive>:1:1</interactive>



In [51]:

    
:type rnew









    




rnew :: Record '[ENERGY, ENERGLO, ENERGHI, SPECRESP]



In [52]:

    
-- would it be easy to add in the column after ENERGHI, say?
type ARFmod = Record '[ENERGY, ENERGLO, ENERGHI, SPECRESP]



In [53]:

    
-- Ideally would only require the lo/hi energy fields;
-- => addEnergyCol2
addEnergyCol :: Record '[ENERGLO, ENERGHI, SPECRESP] -> ARFmod
addEnergyCol r = 
  let elo = view eNERGLO r
      ehi = view eNERGHI r
      emid = (elo + ehi) / 2
  in frameCons (pure emid) r



In [54]:

    
addEnergyCol2 :: 
  (ENERGLO ∈ rs, ENERGHI ∈ rs) 
  => Record rs
  -> Record (ENERGY ': rs)
addEnergyCol2 r = 
  let elo = view eNERGLO r
      ehi = view eNERGHI r
      emid = (elo + ehi) / 2
  in frameCons (pure emid) r



In [55]:

    
:type frameCons









    




frameCons :: forall (f :: * -> *) a (rs :: [*]) (s :: Symbol). Functor f => f a -> Rec f rs -> Rec f ((s :-> a) : rs)



In [56]:

    
{-

Not sure this is a worthwhile avenue

-- try and avoid types where necessary; I was hoping just to be able to give
-- a type to the new column, but it isn't obvious if it's possible...
addEnergyCol3 r = 
  let elo = view eNERGLO r
      ehi = view eNERGHI r
      emid = (elo + ehi) / 2
      newcol :: ENERGY ????XXXXX what goes here
      newcol = pure emid
  in frameCons newcol r
  
  
-}



In [57]:

    
addEnergy :: Frame ARF -> Frame ARFmod
addEnergy = fmap addEnergyCol



In [58]:

    
import qualified Pipes.Prelude as P
import Pipes hiding (Proxy)



In [59]:

    
{-

writers :: (Occupation ∈ rs, Monad m) => Pipe (Record rs) (Record rs) m r
writers = P.filter ((== "writer") . view occupation)

-}

-- for now be very specific with the types
addEnergyP :: Monad m => Pipe ARF ARFmod m r
addEnergyP = P.map addEnergyCol



In [60]:

    
runEffect (arfStream >-> addEnergyP >-> P.take 6 >-> P.print)









    





{ENERGY :-> 0.305, ENERG_LO :-> 0.3, ENERG_HI :-> 0.31, SPECRESP :-> 4.874343}
{ENERGY :-> 0.315, ENERG_LO :-> 0.31, ENERG_HI :-> 0.32, SPECRESP :-> 14.82926}
{ENERGY :-> 0.325, ENERG_LO :-> 0.32, ENERG_HI :-> 0.33, SPECRESP :-> 21.30229}
{ENERGY :-> 0.335, ENERG_LO :-> 0.33, ENERG_HI :-> 0.34, SPECRESP :-> 28.51495}
{ENERGY :-> 0.345, ENERG_LO :-> 0.34, ENERG_HI :-> 0.35, SPECRESP :-> 35.39883}
{ENERGY :-> 0.355, ENERG_LO :-> 0.35, ENERG_HI :-> 0.36, SPECRESP :-> 41.54232}



In [61]:

    
arfMod = addEnergy arf



In [62]:

    
printN 6 arfMod









    





{ENERGY :-> 0.305, ENERG_LO :-> 0.3, ENERG_HI :-> 0.31, SPECRESP :-> 4.874343}
{ENERGY :-> 0.315, ENERG_LO :-> 0.31, ENERG_HI :-> 0.32, SPECRESP :-> 14.82926}
{ENERGY :-> 0.325, ENERG_LO :-> 0.32, ENERG_HI :-> 0.33, SPECRESP :-> 21.30229}
{ENERGY :-> 0.335, ENERG_LO :-> 0.33, ENERG_HI :-> 0.34, SPECRESP :-> 28.51495}
{ENERGY :-> 0.345, ENERG_LO :-> 0.34, ENERG_HI :-> 0.35, SPECRESP :-> 35.39883}
{ENERGY :-> 0.355, ENERG_LO :-> 0.35, ENERG_HI :-> 0.36, SPECRESP :-> 41.54232}

Can I write a model function that requires an ARF-like record and calculates a model?



In [63]:

    
-- Note, since the "columns" are typed, in that SPECRESP/...
-- are Doubles, this is not polymorphic.
--
-- This is an "integrated" model
powerLaw1 ::   
  (ENERGLO ∈ rs, ENERGHI ∈ rs, SPECRESP ∈ rs) 
  => Double      -- ^ amplitude at 1 keV
  -> Double      -- ^ gamma
  -> Record rs
  -> Double      -- ^ units of the amplitude * Energy
powerLaw1 norm gamma rs = 
  let elo = view eNERGLO rs
      ehi = view eNERGHI rs
      p   = 1 - gamma
      val = if gamma == 1
            then log (ehi/elo)
            else (ehi**p - elo**p) / p
             
  in norm * val

Perhaps I want to add the model value to the record rather than just return a Double; in this case I think the interface would be that there's a function to do this, and it accepts a generic model (with an interface more like the ones shown before, i.e. something that doesn't know about records).



In [64]:

    
type IModelFunc = Double -> Double -> Double

type IModel = "IModel" :-> Double



In [65]:

    
evalModel :: 
  (ENERGLO ∈ rs, ENERGHI ∈ rs, SPECRESP ∈ rs) 
  => IModelFunc
  -> Record rs
  -> Double
evalModel f r = 
  let elo = view eNERGLO r
      ehi = view eNERGHI r
  in f elo ehi
  
addModel :: 
  (ENERGLO ∈ rs, ENERGHI ∈ rs, SPECRESP ∈ rs) 
  => IModelFunc
  -> Record rs
  -> Record (IModel ': rs)
addModel f r = frameCons (pure (evalModel f r)) r



In [66]:

    
powerLaw :: 
  Double         -- ^ normalization 
  -> Double      -- ^ gamma 
  -> IModelFunc
powerLaw norm gamma elo ehi | gamma == 1 = norm * log (ehi / elo)
                            | otherwise  = let p = 1 - gamma
                                           in norm * (ehi**p - elo**p) / p



In [67]:

    
model = fmap (addModel (powerLaw 1.0 2.0)) arf



In [68]:

    
printN 6 model









    





{IModel :-> 0.10752688172043001, ENERG_LO :-> 0.3, ENERG_HI :-> 0.31, SPECRESP :-> 4.874343}
{IModel :-> 0.10080645161290347, ENERG_LO :-> 0.31, ENERG_HI :-> 0.32, SPECRESP :-> 14.82926}
{IModel :-> 9.469696969696972e-2, ENERG_LO :-> 0.32, ENERG_HI :-> 0.33, SPECRESP :-> 21.30229}
{IModel :-> 8.912655971479522e-2, ENERG_LO :-> 0.33, ENERG_HI :-> 0.34, SPECRESP :-> 28.51495}
{IModel :-> 8.403361344537785e-2, ENERG_LO :-> 0.34, ENERG_HI :-> 0.35, SPECRESP :-> 35.39883}
{IModel :-> 7.936507936507953e-2, ENERG_LO :-> 0.35, ENERG_HI :-> 0.36, SPECRESP :-> 41.54232}

Let's just get the energy and integrated-model value:



In [69]:

    
-- as no longer have a singleton list, do not need the quote before the []
-- but leave in for now.
--
smodel = fmap (select (Proxy::Proxy '[ENERGY, IModel]) . addEnergyCol2) model



In [70]:

    
printN 3 smodel









    





{ENERGY :-> 0.305, IModel :-> 0.10752688172043001}
{ENERGY :-> 0.315, IModel :-> 0.10080645161290347}
{ENERGY :-> 0.325, IModel :-> 9.469696969696972e-2}

So, how do I access these new columns? The tableTypes routine set up accessor symbols - e.g. eNERG_LO - but I haven't created them for these new columns. So, I can do it with some proxies:



In [72]:

    
eNERGY = rlens (Proxy :: Proxy ENERGY)
iModel = rlens (Proxy :: Proxy IModel)



In [73]:

    
view eNERGY (frameRow smodel 0)
view iModel (frameRow smodel 0)









    





0.305






    





0.10752688172043001

XXX can I get a vector out? Do I want to here?

XXX plot the data



In [74]:

    
import IHaskell.Display
import Graphics.Rendering.Chart.Backend.Diagrams

-- Hide view since I want to use the Frames-provided version here
import Graphics.Rendering.Chart.Easy hiding (view)

instance IHaskellDisplay (Renderable a) where
    display renderable = renderableToSVG renderable 450 300 >>= display . fst



In [75]:

    
displayIModel :: (ENERGY ∈ rs, IModel ∈ rs) => Frame (Record rs) -> Renderable ()
displayIModel frm = toRenderable (do
  let getCS rs = (view eNERGY rs, LogValue (view iModel rs))
      cs = F.toList (fmap getCS frm) 
      
  layout_title .= "Model"
  layout_x_axis . laxis_title .= "E (keV)"
  layout_y_axis . laxis_title .= "photon/cm^2/s"
  
  plot (line "" [cs])
  )



In [76]:

    
displayIModel smodel

XXX TODO: do a "per kev" version. for that really need the energ_lo/hi bins ...

XXX could there be a simple "frame plot" command that takes two columns and plots them, using XXX column headers as labels



In [77]:

    
displayModel :: (ENERGLO ∈ rs, ENERGHI ∈ rs, IModel ∈ rs) => Frame (Record rs) -> Renderable ()
displayModel frm = toRenderable (do
  let getCS rs = let elo = view eNERGLO rs
                     ehi = view eNERGHI rs
                     mdl = view iModel rs 
                 in (0.5*(elo+ehi), LogValue (mdl / (ehi-elo)))
                 
      cs = F.toList (fmap getCS frm) 
      
  layout_title .= "Model"
  layout_x_axis . laxis_title .= "E (keV)"
  layout_y_axis . laxis_title .= "photon/cm^2/s/keV"
  
  plot (line "" [cs])
  )



In [78]:

    
displayModel model

The type safety comes into play when you try to apply the wrong routine for a frame (or give the wrong frame to a plot function):



In [79]:

    
displayModel smodel









    




No instance for (RElem ("ENERG_LO" :-> Double) '[] (Data.Vinyl.TypeLevel.RIndex ("ENERG_LO" :-> Double) '[])) arising from a use of ‘displayModel’
In the expression: displayModel smodel
In an equation for ‘it’: it = displayModel smodel



In [80]:

    
displayIModel model









    




No instance for (RElem ("ENERGY" :-> Double) '[] (Data.Vinyl.TypeLevel.RIndex ("ENERGY" :-> Double) '[])) arising from a use of ‘displayIModel’
In the expression: displayIModel model
In an equation for ‘it’: it = displayIModel model



In [81]:

    
import GHC.TypeLits (KnownSymbol)



In [82]:

    
-- | Plot up the first two columns (as x and y)
--
--   column selection is left to the caller - e.g. via judicious calls to `select`
--
splot :: 
  (PlotValue a, PlotValue b, KnownSymbol sx, KnownSymbol sy, ColumnHeaders rs) 
  => (x -> a)
  -> (y -> b)
  -> Frame (Record (sx :-> x ': sy :-> y ': rs))
  -> Renderable ()
splot px py frm = toRenderable (do
  let getxy rs = let (x, yrs) = recUncons rs
                     (y, _)   = recUncons yrs
                 in (px x, py y)
                 
      cs = F.toList (fmap getxy frm)
      
      (xlbl:ylbl:_) = columnHeaders frm

  layout_x_axis . laxis_title .= xlbl
  layout_y_axis . laxis_title .= ylbl
  plot (line "" [cs])
  )



In [83]:

    
splot id id (fmap (select [pr|ENERGY,SPECRESP|]) arfMod)



In [84]:

    
splot LogValue LogValue (fmap (select [pr|ENERGY,IModel|]) smodel)

XXXX TODO: finish writing this ;-)

The end

There you go; I hope you enjoyed it. If you have any questions, then please use the GitHub issues page or contact me on Twitter at https://twitter.com/doug_burke.